# Large model compression
Qwen3 32B FP8 Dynamic
Apache-2.0
An efficient language model based on Qwen3-32B with FP8 dynamic quantization, significantly reducing memory requirements and improving computational efficiency
Large Language Model
Transformers

Q
RedHatAI
917
8
Llama 3.1 Nemotron 70B Instruct HF GGUF
A model fine-tuned based on Meta Llama-3.1-70B-Instruct, optimized with NVIDIA HelpSteer2 dataset, supporting text generation tasks.
Large Language Model English
L
Mungert
1,434
3
Qwq 32B FP8 Dynamic
MIT
FP8 quantized version of QwQ-32B, reducing storage and memory requirements by 50% through dynamic quantization while maintaining 99.75% of the original model accuracy
Large Language Model
Transformers

Q
RedHatAI
3,107
8
Molmo 7B O Bnb 4bit
Apache-2.0
The 4-bit quantized version of Molmo-7B-O, significantly reducing the memory requirement and suitable for environments with limited resources.
Large Language Model
Transformers

M
cyan2k
2,467
11
Featured Recommended AI Models